Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDN-4168: Cleanup ipsec state only when ipsec is not full mode #2611

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pperiyasamy
Copy link
Member

@pperiyasamy pperiyasamy commented Jan 9, 2025

This PR does the following to fixes to prevent unnecessary ipsec service restart, ip xfrm state policy cleanups while bringing up ipsec-host pod. This would potentially avoid reestablishment of IKE SAs during ipsec pod restarts and let OVN networking pods traffic go on without any packet drops.

  1. There is an incorrect check in ipsec pod clean up logic which removes /etc/ipsec.d/openshift.conf file, ip xfrm state and policy entries in all cases, but these must be removed only when ipsec mode is changed from full to external or disabled.
  2. We don't need narrowing=yes option to be set explicitly anymore because system default crypto policies are commented out now, otherwise TS_UNACCEPTABLE error is seen temporarily at the time of ipsec service restart.
  3. The IPsec service restart is needed only at the time of specific IPsec config changes, so doing ipsec service only at the time commenting out default crypto-policies conf file.

There is an incorrect check while cleaning up ipsec state upon deleting ipsec pod
which removes states in all cases, so this fix removes state only when ipsec mode
is not full mode.

Signed-off-by: Periyasamy Palanisamy <[email protected]>
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 9, 2025

@pperiyasamy: This pull request references SDN-4168 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

There is an incorrect check while cleaning up ipsec state upon deleting ipsec pod which removes states in all cases, so this fix removes state only when ipsec mode is not full mode.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 9, 2025
@openshift-ci openshift-ci bot requested review from trozet and tssurya January 9, 2025 10:33
@pperiyasamy
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/e2e-ovn-ipsec-step-registry openshift/origin#29232

This reverts commit e0bfa7e.

Signed-off-by: Periyasamy Palanisamy <[email protected]>
Signed-off-by: Periyasamy Palanisamy <[email protected]>
@pperiyasamy
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/e2e-ovn-ipsec-step-registry openshift/origin#29232

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 9, 2025

@pperiyasamy: This pull request references SDN-4168 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

There is an incorrect check while cleaning up ipsec state upon deleting ipsec pod which removes states in all cases, so this fix removes state only when ipsec mode is not full mode.

Seeing disruptive events being thrown for ipsec pod restart test, It's going to be fixed with this PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pperiyasamy
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/e2e-ovn-ipsec-step-registry openshift/origin#29232

@pperiyasamy
Copy link
Member Author

The [sig-arch][Late] operators should not create watch channels very often [apigroup:apiserver.openshift.io] [Suite:openshift/conformance/parallel] test is failing in the run https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-cluster-network-operator-2611-openshift-origin-29232-e2e-ovn-ipsec-step-registry/1877412979897536512 upon ipsec pod reboots.
It happens even when there is no ipsec state/policy cleanup, no pluto restart. Needs investigation...

@pperiyasamy
Copy link
Member Author

/testwith openshift/cluster-network-operator/master/e2e-ovn-ipsec-step-registry openshift/origin#29232

@pperiyasamy
Copy link
Member Author

The [sig-arch][Late] operators should not create watch channels very often [apigroup:apiserver.openshift.io] [Suite:openshift/conformance/parallel] test is failing in the run https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-cluster-network-operator-2611-openshift-origin-29232-e2e-ovn-ipsec-step-registry/1877412979897536512 upon ipsec pod reboots. It happens even when there is no ipsec state/policy cleanup, no pluto restart. Needs investigation...

it may be a flaky test, tracking it via bug https://issues.redhat.com/browse/OCPBUGS-46414.

@pperiyasamy
Copy link
Member Author

/assign @jcaamano @trozet

@pperiyasamy
Copy link
Member Author

/retest

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 13, 2025

@pperiyasamy: This pull request references SDN-4168 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

This PR does the following to fixes to prevent unnecessary ipsec service restart, ip xfrm state policy cleanups while bringing up ipsec-host pod. This would potentially avoid reestablishment of IKE SAs during ipsec pod restarts and let OVN networking pods traffic go on without any packet drops.

  1. There is an incorrect check in ipsec pod clean up logic which removes /etc/ipsec.d/openshift.conf file, ip xfrm state and policy entries in all cases, but these must be removed only when ipsec mode is changed from full to external or disabled.
  2. We don't need narrowing=yes option to be set explicitly anymore because system default crypto policies are commented out now, otherwise TS_UNACCEPTABLE error is seen temporarily at the time of ipsec service restart.
  3. The IPsec service restart is needed only at the time of specific IPsec config changes, so doing ipsec service only at the time commenting out default crypto-policies conf file.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Jan 13, 2025

@pperiyasamy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade ea1d489 link false /test 4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade
ci/prow/e2e-aws-hypershift-ovn-kubevirt ea1d489 link false /test e2e-aws-hypershift-ovn-kubevirt
ci/prow/e2e-openstack-ovn ea1d489 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node ea1d489 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec ea1d489 link true /test e2e-metal-ipi-ovn-ipv6-ipsec
ci/prow/e2e-aws-ovn-serial ea1d489 link false /test e2e-aws-ovn-serial
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 ea1d489 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade ea1d489 link false /test 4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-local-to-shared-gateway-mode-migration ea1d489 link false /test e2e-aws-ovn-local-to-shared-gateway-mode-migration
ci/prow/e2e-network-mtu-migration-ovn-ipv6 ea1d489 link false /test e2e-network-mtu-migration-ovn-ipv6
ci/prow/security ea1d489 link false /test security
ci/prow/e2e-aws-ovn-upgrade ea1d489 link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-ipv6 ea1d489 link true /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -407,7 +392,7 @@ spec:
# When east-west ipsec is not disabled, then do not flush xfrm states and
# policies in order to maintain traffic flows during container restart.
ipsecflush() {
if [ "$(kubectl get networks.operator.openshift.io cluster -ojsonpath='{.spec.defaultNetwork.ovnKubernetesConfig.ipsecConfig.mode}')" != "Full" ] || \
if [ "$(kubectl get networks.operator.openshift.io cluster -ojsonpath='{.spec.defaultNetwork.ovnKubernetesConfig.ipsecConfig.mode}')" != "Full" ] && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if mode is "Disabled"? Are you sure the comment above matches what you are doing?

@jcaamano
Copy link
Contributor

/lgtm
/approve
/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci bot added acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. lgtm Indicates that a PR is ready to be merged. labels Jan 14, 2025
Copy link
Contributor

openshift-ci bot commented Jan 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcaamano, pperiyasamy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2025
@pperiyasamy
Copy link
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants